Goto

Collaborating Authors

 abnormality detection


Google-MedGemma Based Abnormality Detection in Musculoskeletal radiographs

Maity, Soumyajit, Kamboj, Pranjal, Maity, Sneha, Singh, Rajat, Chatterjee, Sankhadeep

arXiv.org Artificial Intelligence

This paper proposes a MedGemma-based framework for automatic abnormality detection in musculoskeletal radiographs. Departing from conventional autoencoder and neural network pipelines, the proposed method leverages the MedGemma foundation model, incorporating a SigLIP-derived vision encoder pretrained on diverse medical imaging modalities. Preprocessed X-ray images are encoded into high-dimensional embeddings using the MedGemma vision backbone, which are subsequently passed through a lightweight multilayer perceptron for binary classification. Experimental assessment reveals that the MedGemma-driven classifier exhibits strong performance, exceeding conventional convolutional and autoencoder-based metrics. Additionally, the model leverages MedGemma's transfer learning capabilities, enhancing generalization and optimizing feature engineering. The integration of a modern medical foundation model not only enhances representation learning but also facilitates modular training strategies such as selective encoder block unfreezing for efficient domain adaptation. The findings suggest that MedGemma-powered classification systems can advance clinical radiograph triage by providing scalable and accurate abnormality detection, with potential for broader applications in automated medical image analysis. Keywords: Google MedGemma, MURA, Medical Image, Classification.


OmniMRI: A Unified Vision--Language Foundation Model for Generalist MRI Interpretation

He, Xingxin, Rofena, Aurora, Feng, Ruimin, Liao, Haozhe, Zhou, Zhaoye, Jang, Albert, Liu, Fang

arXiv.org Artificial Intelligence

Magnetic Resonance Imaging (MRI) is indispensable in clinical practice but remains constrained by fragmented, multi-stage workflows encompassing acquisition, reconstruction, segmentation, detection, diagnosis, and reporting. While deep learning has achieved progress in individual tasks, existing approaches are often anatomy- or application-specific and lack generalizability across diverse clinical settings. Moreover, current pipelines rarely integrate imaging data with complementary language information that radiologists rely on in routine practice. Here, we introduce OmniMRI, a unified vision-language foundation model designed to generalize across the entire MRI workflow. OmniMRI is trained on a large-scale, heterogeneous corpus curated from 60 public datasets, over 220,000 MRI volumes and 19 million MRI slices, incorporating image-only data, paired vision-text data, and instruction-response data. Its multi-stage training paradigm, comprising self-supervised vision pretraining, vision-language alignment, multimodal pretraining, and multi-task instruction tuning, progressively equips the model with transferable visual representations, cross-modal reasoning, and robust instruction-following capabilities. Qualitative results demonstrate OmniMRI's ability to perform diverse tasks within a single architecture, including MRI reconstruction, anatomical and pathological segmentation, abnormality detection, diagnostic suggestion, and radiology report generation. These findings highlight OmniMRI's potential to consolidate fragmented pipelines into a scalable, generalist framework, paving the way toward foundation models that unify imaging and clinical language for comprehensive, end-to-end MRI interpretation.


Is ChatGPT-5 Ready for Mammogram VQA?

Li, Qiang, Wang, Shansong, Hu, Mingzhe, Safari, Mojtaba, Eidex, Zachary, Yang, Xiaofeng

arXiv.org Artificial Intelligence

Mammogram visual question answering (VQA) integrates image interpretation with clinical reasoning and has potential to support breast cancer screening. We systematically evaluated the GPT-5 family and GPT-4o model on four public mammography datasets (EMBED, InBreast, CMMD, CBIS-DDSM) for BI-RADS assessment, abnormality detection, and malignancy classification tasks. GPT-5 consistently was the best performing model but lagged behind both human experts and domain-specific fine-tuned models. On EMBED, GPT-5 achieved the highest scores among GPT variants in density (56.8%), distortion (52.5%), mass (64.5%), calcification (63.5%), and malignancy (52.8%) classification. On InBreast, it attained 36.9% BI-RADS accuracy, 45.9% abnormality detection, and 35.0% malignancy classification. On CMMD, GPT-5 reached 32.3% abnormality detection and 55.0% malignancy accuracy. On CBIS-DDSM, it achieved 69.3% BI-RADS accuracy, 66.0% abnormality detection, and 58.2% malignancy accuracy. Compared with human expert estimations, GPT-5 exhibited lower sensitivity (63.5%) and specificity (52.3%). While GPT-5 exhibits promising capabilities for screening tasks, its performance remains insufficient for high-stakes clinical imaging applications without targeted domain adaptation and optimization. However, the tremendous improvements in performance from GPT-4o to GPT-5 show a promising trend in the potential for general large language models (LLMs) to assist with mammography VQA tasks.


CwA-T: A Channelwise AutoEncoder with Transformer for EEG Abnormality Detection

Zhao, Youshen, Iramina, Keiji

arXiv.org Artificial Intelligence

Brain disorders such as Alzheimer's disease, epilepsy, Parkinson's disease have attracted significant research interest due to their profound impact on patients' quality of life and healthcare systems globally [1, 2]. Timely and accurate diagnosis is crucial for effective intervention and management, necessitating reliable tools capable of capturing the dynamic changes in brain activity. Electroencephalography (EEG), a cost-effective and non-invasive method for real-time monitoring of brain function, has become a cornerstone in clinical practice for detecting brain disorders. By measuring electrical activity in the brain, EEG provides valuable insights into neural dynamics, particularly for conditions like epilepsy and Alzheimer's disease, where the identification of abnormal patterns is critical for diagnosis and treatment. Recent advances in deep learning (DL) have significantly enhanced the capabilities of computer-aided diagnosis (CAD) systems for EEG analysis. These systems excel at extracting complex, high-dimensional features from raw EEG signals, improving diagnostic accuracy across various applications [3, 4, 5].


Unlocking the Potential of Weakly Labeled Data: A Co-Evolutionary Learning Framework for Abnormality Detection and Report Generation

Sun, Jinghan, Wei, Dong, Xu, Zhe, Lu, Donghuan, Liu, Hong, Wang, Hong, Tsaftaris, Sotirios A., McDonagh, Steven, Zheng, Yefeng, Wang, Liansheng

arXiv.org Artificial Intelligence

Anatomical abnormality detection and report generation of chest X-ray (CXR) are two essential tasks in clinical practice. The former aims at localizing and characterizing cardiopulmonary radiological findings in CXRs, while the latter summarizes the findings in a detailed report for further diagnosis and treatment. Existing methods often focused on either task separately, ignoring their correlation. This work proposes a co-evolutionary abnormality detection and report generation (CoE-DG) framework. The framework utilizes both fully labeled (with bounding box annotations and clinical reports) and weakly labeled (with reports only) data to achieve mutual promotion between the abnormality detection and report generation tasks. Specifically, we introduce a bi-directional information interaction strategy with generator-guided information propagation (GIP) and detector-guided information propagation (DIP). For semi-supervised abnormality detection, GIP takes the informative feature extracted by the generator as an auxiliary input to the detector and uses the generator's prediction to refine the detector's pseudo labels. We further propose an intra-image-modal self-adaptive non-maximum suppression module (SA-NMS). This module dynamically rectifies pseudo detection labels generated by the teacher detection model with high-confidence predictions by the student.Inversely, for report generation, DIP takes the abnormalities' categories and locations predicted by the detector as input and guidance for the generator to improve the generated reports.


AI Readiness in Healthcare through Storytelling XAI

Dubey, Akshat, Yang, Zewen, Hattab, Georges

arXiv.org Artificial Intelligence

Artificial Intelligence is rapidly advancing and radically impacting everyday life, driven by the increasing availability of computing power. Despite this trend, the adoption of AI in real-world healthcare is still limited. One of the main reasons is the trustworthiness of AI models and the potential hesitation of domain experts with model predictions. Explainable Artificial Intelligence (XAI) techniques aim to address these issues. However, explainability can mean different things to people with different backgrounds, expertise, and goals. To address the target audience with diverse needs, we develop storytelling XAI. In this research, we have developed an approach that combines multi-task distillation with interpretability techniques to enable audience-centric explainability. Using multi-task distillation allows the model to exploit the relationships between tasks, potentially improving interpretability as each task supports the other leading to an enhanced interpretability from the perspective of a domain expert. The distillation process allows us to extend this research to large deep models that are highly complex. We focus on both model-agnostic and model-specific methods of interpretability, supported by textual justification of the results in healthcare through our use case. Our methods increase the trust of both the domain experts and the machine learning experts to enable a responsible AI.


Is this Generated Person Existed in Real-world? Fine-grained Detecting and Calibrating Abnormal Human-body

Wang, Zeqing, Ma, Qingyang, Wan, Wentao, Li, Haojie, Wang, Keze, Tian, Yonghong

arXiv.org Artificial Intelligence

Recent improvements in visual synthesis have significantly enhanced the depiction of generated human photos, which are pivotal due to their wide applicability and demand. Nonetheless, the existing text-to-image or text-to-video models often generate low-quality human photos that might differ considerably from real-world body structures, referred to as "abnormal human bodies". Such abnormalities, typically deemed unacceptable, pose considerable challenges in the detection and repair of them within human photos. These challenges require precise abnormality recognition capabilities, which entail pinpointing both the location and the abnormality type. Intuitively, Visual Language Models (VLMs) that have obtained remarkable performance on various visual tasks are quite suitable for this task. However, their performance on abnormality detection in human photos is quite poor. Hence, it is quite important to highlight this task for the research community. In this paper, we first introduce a simple yet challenging task, i.e., \textbf{F}ine-grained \textbf{H}uman-body \textbf{A}bnormality \textbf{D}etection \textbf{(FHAD)}, and construct two high-quality datasets for evaluation. Then, we propose a meticulous framework, named HumanCalibrator, which identifies and repairs abnormalities in human body structures while preserving the other content. Experiments indicate that our HumanCalibrator achieves high accuracy in abnormality detection and accomplishes an increase in visual comparisons while preserving the other visual content.


Tokensome: Towards a Genetic Vision-Language GPT for Explainable and Cognitive Karyotyping

Zhang, Haoxi, Zhang, Xinxu, Lin, Yuanxin, Wang, Maiqi, Lai, Yi, Wang, Yu, Yu, Linfeng, Xu, Yufeng, Cheng, Ran, Szczerbicki, Edward

arXiv.org Artificial Intelligence

Artificial intelligence (AI) has achieved significant progress due to recent rapid advances in deep learning [1] techniques. Through deep learning's powerful automated feature extraction capacities--enabling detection of nuanced multidimensional patterns in medical images beyond human discernment--AI holds considerable potential to unlock substantial improvements in medical imaging domain [2][3]. However, integration of AI technologies into real-world clinical settings remains severely limited. A major obstacle is the predominant opacity of state-of-the-art AI systems, which frequently manifest as inscrutable "black-box" models that provide no actionable evidence or confidence metrics to justify their output decisions and predictions [4]. This lack of model explainability or interpretability not only hinders scientific understanding of system behavior for imaging analysis tasks, but also critically erodes clinician and patient trust. Karyotyping is a vital cytogenetic task for detecting genetic abnormalities by analyzing metaphase chromosome images [5]. The process involves first preparing a complete set of microphotographed metaphase chromosomes from cell samples. This includes properly segmenting, classifying, and pairing the 23 chromosome types into homologous pairs to produce a karyogram (see Figure 1). Subsequently, the karyogram is then carefully analyzed by clinical cytogeneticists to identify any anomalies [6][7].


A Compact LSTM-SVM Fusion Model for Long-Duration Cardiovascular Diseases Detection

Wu, Siyang

arXiv.org Artificial Intelligence

Globally, cardiovascular diseases (CVDs) are the leading cause of mortality, accounting for an estimated 17.9 million deaths annually. One critical clinical objective is the early detection of CVDs using electrocardiogram (ECG) data, an area that has received significant attention from the research community. Recent advancements based on machine learning and deep learning have achieved great progress in this domain. However, existing methodologies exhibit inherent limitations, including inappropriate model evaluations and instances of data leakage. In this study, we present a streamlined workflow paradigm for preprocessing ECG signals into consistent 10-second durations, eliminating the need for manual feature extraction/beat detection. We also propose a hybrid model of Long Short-Term Memory (LSTM) with Support Vector Machine (SVM) for fraud detection. This architecture consists of two LSTM layers and an SVM classifier, which achieves a SOTA results with an Average precision score of 0.9402 on the MIT-BIH arrhythmia dataset and 0.9563 on the MIT-BIH atrial fibrillation dataset. Based on the results, we believe our method can significantly benefit the early detection and management of CVDs.


MDF-Net for abnormality detection by fusing X-rays with clinical data

Hsieh, Chihcheng, Nobre, Isabel Blanco, Sousa, Sandra Costa, Ouyang, Chun, Brereton, Margot, Nascimento, Jacinto C., Jorge, Joaquim, Moreira, Catarina

arXiv.org Artificial Intelligence

This study investigates the effects of including patients' clinical information on the performance of deep learning (DL) classifiers for disease location in chest X-ray images. Although current classifiers achieve high performance using chest X-ray images alone, our interviews with radiologists indicate that clinical data is highly informative and essential for interpreting images and making proper diagnoses. In this work, we propose a novel architecture consisting of two fusion methods that enable the model to simultaneously process patients' clinical data (structured data) and chest X-rays (image data). Since these data modalities are in different dimensional spaces, we propose a spatial arrangement strategy, spatialization, to facilitate the multimodal learning process in a Mask R-CNN model. We performed an extensive experimental evaluation using MIMIC-Eye, a dataset comprising modalities: MIMIC-CXR (chest X-ray images), MIMIC IV-ED (patients' clinical data), and REFLACX (annotations of disease locations in chest X-rays). Results show that incorporating patients' clinical data in a DL model together with the proposed fusion methods improves the disease localization in chest X-rays by 12\% in terms of Average Precision compared to a standard Mask R-CNN using only chest X-rays. Further ablation studies also emphasize the importance of multimodal DL architectures and the incorporation of patients' clinical data in disease localization. The architecture proposed in this work is publicly available to promote the scientific reproducibility of our study (https://github.com/ChihchengHsieh/multimodal-abnormalities-detection)